Search CORE

153 research outputs found

Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization

Author: Gao Tianmeng
Huang Chengwei
Liu Yun
Song Baolin
Publication venue
Publication date: 01/10/2017
Field of study

In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources

arXiv.org e-Print Archive

Diversity and distribution of physical dormant species in relation to ecosystem and life-forms

Author: Jaganathan Ganesh K.
Liu Baolin
Song Danping
Publication venue: 'Horizon E-Publishing Group'
Publication date: 01/04/2017
Field of study

Impermeable seed/fruit coat, i.e. physical dormancy (PY) occurring only in several genera of 18 angiosperm families plays an important role in controlling seed persistence and germination timing. It has been theoretically speculated that PY is more prevalent in drylands than in moist vegetation zones, but unequivocal support for this assertion is currently unavailable. The broad objective of this contribution was to examine the distribution of PY on the various vegetation of tropics and temperate ecosystems using a data set of 13, 792 species. The number of species with PY in tropics (19%) is higher than the number of PY species in the temperate ecosystem (15%). However, in both tropics and temperate, there is a clear trend that PY is less common in moist and low-temperature vegetation zones compared with dry and high-temperature vegetation. In tropics, PY is more prevalent in dry woodlands (33%) and tropical deciduous forests (27.3%) compared with the evergreen rain forest (9%). Similarly, in the temperate zone, dry vegetation with seasonal rainfall such as Matorral (22.3) and deserts (19.5%) have a higher number of PY species compared with moist warm woodlands (8.1%) and deciduous forest (9%). Although PY is a trait found in various life-forms, it appears to be less common in trees, particularly of the temperate zone. We discuss the ecological adaptation of PY in the dry ecosystem and consider the mechanism of persistence and dormancy break in PY and physiological dormant (PD) species

Horizon e-Publishing Group (HePG): E-Journals

ValueNet: A New Dataset for Human Value Driven Dialogue System

Author: Gao Jianfeng
Li Jinchao
Lu Pan
Peng Baolin
Qiu Liang
Zhao Yizhou
Zhu Song-Chun
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 12/12/2021
Field of study

Building a socially intelligent agent involves many challenges, one of which is to teach the agent to speak guided by its value like a human. However, value-driven chatbots are still understudied in the area of dialogue systems. Most existing datasets focus on commonsense reasoning or social norm modeling. In this work, we present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios. The dataset is organized in ten dimensions that conform to the basic human value theory in intercultural research. We further develop a Transformer-based value regression model on ValueNet to learn the utility distribution. Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks. For example, by teaching a generative agent with reinforcement learning and the rewards from the value model, our method attains state-of-the-art performance on the personalized dialog generation dataset: Persona-Chat. With values as additional features, existing emotion recognition models enable capturing rich human emotions in the context, which further improves the empathetic response generation performance in the EmpatheticDialogues dataset. To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems. The dataset is available at https://liang-qiu.github.io/ValueNet/.Comment: Paper accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

Author: Liu Baolin
Liu Yan
Liu Ziqi
Song Ziyi
Wang Pengfei
Xiong Yongping
Yang Zongyuan
Zhou Junjie
Publication venue
Publication date: 13/08/2023
Field of study

The goal of scene text image super-resolution is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that exhibit a notable degree of blurring, thereby exerting a substantial impact on both the readability and recognizability of the text. To address these issues, we propose TextDiff, the first diffusion-based framework tailored for scene text image super-resolution. It contains two modules: the Text Enhancement Module (TEM) and the Mask-Guided Residual Diffusion Module (MRD). The TEM generates an initial deblurred text image and a mask that encodes the spatial location of the text. The MRD is responsible for effectively sharpening the text edge by modeling the residuals between the ground-truth images and the initial deblurred images. Extensive experiments demonstrate that our TextDiff achieves state-of-the-art (SOTA) performance on public benchmark datasets and can improve the readability of scene text images. Moreover, our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods. This enhancement not only improves the readability and recognizability of the results generated by SOTA methods but also does not require any additional joint training. Available Codes:https://github.com/Lenubolim/TextDiff

arXiv.org e-Print Archive

The Trickle-down Impact of Reward (In-)consistency on RLHF

Author: Chen Sihao
Jin Lifeng
Khashabi Daniel
Mi Haitao
Peng Baolin
Shen Lingfeng
Song Linfeng
Yu Dong
Publication venue
Publication date: 28/09/2023
Field of study

Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process

arXiv.org e-Print Archive

SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Author: Liang Yuan
Lu Pan
Peng Baolin
Qiu Liang
Wu Ying Nian
Yu Zhou
Zhao Yizhou
Zhu Song-Chun
Publication venue
Publication date: 24/06/2021
Field of study

Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly. We model the social network as an And-or Graph, named SocAoG, for the consistency of relations among a group and leveraging attributes as inference cues. Moreover, we formulate a sequential structure prediction task, and propose an

\alpha

\beta

\gamma

strategy to incrementally parse SocAoG for the dynamic inference upon any incoming utterance: (i) an

\alpha

process predicting attributes and relations conditioned on the semantics of dialogues, (ii) a

\beta

process updating the social relations based on related attributes, and (iii) a

\gamma

process updating individual's attributes based on interpersonal social relations. Empirical results on DialogRE and MovieGraph show that our model infers social relations more accurately than the state-of-the-art methods. Moreover, the ablation study shows the three processes complement each other, and the case study demonstrates the dynamic relational inference.Comment: Long paper (oral) accepted by ACL-IJCNLP 202

arXiv.org e-Print Archive

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Author: Chang Kai-Wei
Cheng Hao
Galley Michel
Gao Jianfeng
Lu Pan
Peng Baolin
Wu Ying Nian
Zhu Song-Chun
Publication venue
Publication date: 19/04/2023
Field of study

Large language models (LLMs) have achieved remarkable progress in various natural language processing tasks with emergent abilities. However, they face inherent limitations, such as an inability to access up-to-date information, utilize external tools, or perform precise mathematical reasoning. In this paper, we introduce Chameleon, a plug-and-play compositional reasoning framework that augments LLMs to help address these challenges. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, Chameleon infers the appropriate sequence of tools to compose and execute in order to generate a final response. We showcase the adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP. Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.Comment: 25 pages, 10 figures. Project page: https://chameleon-llm.github.i

arXiv.org e-Print Archive